System and method for hydrating graph databases from external data

ABSTRACT

One example method includes receiving, by a graph proxy, an event from an event generator, and the event includes information about an IO and information about data affected by the IO, comparing, by the graph proxy, the data to a schema, when the data is determined by the graph proxy to map to the schema, identifying, by the graph proxy, a rule that is associated with the event, and the rule specifies performance of an action when a condition is met, and when the condition is met, performing, by the graph proxy, the action, and the action is performed with respect to a graph.

FIELD OF THE INVENTION

Embodiments of the present invention generally relate to hydration, thatis, population of graphs and graph databases. More particularly, atleast some embodiments of the invention relate to systems, hardware,software, computer-readable media, and methods for hydrating,automatically in some embodiments, graphs, and other graphicalrepresentations of data and/or information, using data such asunderlying file and object data.

BACKGROUND

Graphs are a fundamental element in storing information about dataassets and their relationships. Graphs are key elements in workflowssuch as data science, analytics, and machine learning. It is generallydifficult to hydrate, that is, populate, a graph from underlying fileand object data. For example, most data users undertake a largelymanual, bespoke effort to have such data represented in the graph. Suchdifficulty, in turn, creates delays in executing such workflows.

As illustrated by the following points, conventional processes forhydrating a graph are difficult, time-consuming, and error prone. Forexample, users often must craft custom solutions to extract and loaddata into graph, including their relationships and properties. Asanother example, conventional approaches do not provide near real-timeupdates of the graph. Such delays in updating may be due to the dataextract and load logic that is run in a batch type of processing. As afinal example, the use of existing knowledge graphs, enterpriseknowledge graphs, and public and private ontologies, are not utilized totheir full capacity, and sometimes not at all. This leads to terminologyconfusion across graph systems. This confusion, in turn, leads to a dropin the ability to take full advantage of graphs, and not just graphdatabases, but overall graph algorithms such as traversals, clustering,merging, and subgraphs, for example.

BRIEF DESCRIPTION OF THE DRAWINGS

To describe the manner in which at least some of the advantages andfeatures of the invention may be obtained, a more particular descriptionof embodiments of the invention will be rendered by reference tospecific embodiments thereof which are illustrated in the appendeddrawings. Understanding that these drawings depict only typicalembodiments of the invention and are not therefore to be construed tolimit its scope. Such typical embodiments of the invention will bedescribed and explained with additional specificity and detail throughthe use of the accompanying drawings.

FIG. 1 discloses aspects of an example entity for generating andemitting events.

FIG. 2 discloses aspects of an example graph proxy.

FIG. 3 discloses aspects of an example object which may be processed bysome embodiments.

FIG. 4 discloses aspects of an example parent schema that references anexample child schema.

FIG. 5 discloses an example vertex rule.

FIG. 6 discloses example edge rules.

FIG. 7 discloses example graphs that may be generated by someembodiments.

FIG. 8 discloses an example method for using emitted events to modify agraph.

FIG. 9 discloses aspects of an example computing entity operable toperform any of the disclosed methods, processes, and operations.

DETAILED DESCRIPTION OF SOME EXAMPLE EMBODIMENTS

Embodiments of the present invention generally relate to hydration, thatis, population of graphs and graph databases. More particularly, atleast some embodiments of the invention relate to systems, hardware,software, computer-readable media, and methods for hydrating,automatically in some embodiments, graphs, and other graphicalrepresentations of data and/or information, using data such asunderlying file and object data. At least some embodiments embrace ahorizontal solution for hydrating a graph with information from filesand objects by providing facilities to allow the user to dictate how thedata should be loaded. Note that as used herein, ‘hydrating’ and‘hydration’ and ‘hydrate,’ embrace, but are not limited to, processesfor loading data and relationships into a graph or other visuallyperceptible representation of data and/or information.

In general, some example embodiments of the invention provide anarchitecture by which data from files and objects can be automaticallyloaded into a graph to help accelerate various workflows that employ thegraph and/or the underlying data, files, and objects. Some embodimentsenable a transmitting entity such as a computing system hardware and/orsoftware element, application, or computing process, for example, toemit an event when data is created, modified, or deleted. A receivingentity, which may be referred to herein as a graph proxy, may listen forsuch events, and then act, possibly automatically, on received events.For example, the receiving entity may interact, directly or indirectly,with one or more graph databases for the purpose of creating, modifying,or deleting, a record in the graph database, and implementingcorresponding changes in a graph, where such changes may include, forexample, creation of graphical elements, of one or more graphs, wherethe graphical elements comprise, and/or represent, vertices, edges, andproperties, included in and/or implicated by the graph(s). Embodimentsof the invention may also enable a user, and/or a machine learning (ML)algorithm, to generate various schemas and rules for the handling ofincoming events. In general, the schemas and rules may enable, inresponse to receipt of an emitted event or stream of emitted events, theautomatic generation, automatic modification, and/or, automaticdeletion, of graph elements including, but not limited to, vertices,edges, and properties. In this way, the graph may be updated innear-real time as events are emitted.

Embodiments of the invention, such as the examples disclosed herein, maybe beneficial in a variety of respects. For example, and as will beapparent from the present disclosure, one or more embodiments of theinvention may provide one or more advantageous and unexpected effects,in any combination, some examples of which are set forth below. Itshould be noted that such effects are neither intended, nor should beconstrued, to limit the scope of the claimed invention in any way. Itshould further be noted that nothing herein should be construed asconstituting an essential or indispensable element of any invention orembodiment. Rather, various aspects of the disclosed embodiments may becombined in a variety of ways to define yet further embodiments. Suchfurther embodiments are considered as being within the scope of thisdisclosure. As well, none of the embodiments embraced within the scopeof this disclosure should be construed as resolving, or being limited tothe resolution of, any problem(s). Nor should any such embodiments beconstrued to implement, or be limited to implementation of, anytechnical effect(s) or solution(s). Finally, it is not required that anyembodiment implement any of the advantageous and unexpected effectsdisclosed herein.

One advantageous aspect of at least some embodiments of the invention isthat graphs may be generated, and modified, automatically in response todetection of one or more emitted events. Embodiments may provide forongoing automatic updates to a graph. Embodiments may provide for use ofan ML model to generate and/or modify rules and/or schemas for handlingemitted events. Various other advantageous aspects of some exampleembodiments are disclosed elsewhere herein. Embodiments may improve thespeed and accuracy of processes that rely for their successfulperformance on a graph and/or the information contained in the graph.

It is noted that embodiments of the invention, whether claimed or not,cannot be performed, practically or otherwise, in the mind of a human.Accordingly, nothing herein should be construed as teaching orsuggesting that any aspect of any embodiment of the invention could orwould be performed, practically or otherwise, in the mind of a human.Further, and unless explicitly indicated otherwise herein, the disclosedmethods, processes, and operations, are contemplated as beingimplemented by computing systems that may comprise hardware and/orsoftware. That is, such methods processes, and operations, are definedas being computer-implemented.

A. Overview

Graphs may be a fundamental element for storing information aboutentities such as, but not limited to, data assets and theirrelationships. Graphs may be key elements to workflows such as datascience, analytics, and machine learning. Graphs may contain threeprimary classes of data. The first of these is vertices. In someembodiments, a vertex in a graph may represent a data asset, althoughthe scope of the invention is not so limited. The second class of datathat may be included in a graph is an edge. In general, an edge includesdimensions of a relationship between two vertices of graph. An edge mayhave at least two dimensions. The first dimension is a direction thatindicates whether a relationship between vertices is one way, orbidirectional. The second dimension is a type, such as ‘Drives’ forexample, that comprises a classification of the edge, that is, of therelationship between two vertices. The third class of data that may beincluded in a graph is properties. A property provides additionalmetadata about a vertex or additional metadata about an edge. Propertiesare useful to the application using the graph data. The followingexample illustrates some of these concepts.

In this example, there are various elements that are related in variousways by one or more vertices, edges, and properties. For example, ‘Joel’(a human) and ‘Toyota’ (motor vehicle) are vertices, and ‘Drives’ is anedge that connects ‘Joel’ to ‘Toyota.’ This edge may indicate that‘Joel’ ‘Drives’ a ‘Toyota.’ This example edge is one-directional from‘Joel’ to ‘Toyota’ since the ‘Toyota’ cannot ‘Drive’ ‘Joel.’ In thisexample, ‘Joel’ may have additional properties such as ‘Hair: Brown,’and ‘Toyota’ might have a property such as ‘Model: Highlander.’ Finally,the ‘Drives’ edge might have some properties such as ‘Frequency: Low.’

B. Aspects of Some Example Embodiments

Attention is directed now to various example embodiments of theinvention. As noted earlier, typical data users undertake a largelymanual, bespoke effort to have such data represented in the graph. Thus,example embodiments of the invention may eliminate the need for manualprocesses, as well as the problems that attend such processes, throughimplementation of a horizontal solution for hydrating a graph withinformation from files and objects by providing facilities to allow theuser to dictate how the data should be loaded into the graph. As such,embodiments may improve the speed and efficiency with whichgraph-centered-workflows are performed. For example, to enable executionof graph-centered workflows, data must be present in the graph. Further,data stored in the graph must have some meaningful properties andrelationships to be useful. Much of enterprise data is semi-structured,such as in the form of JSON (JavaScript Object Notation), XML(Extensible Markup Language), or CSV (Comma-Separated Value format), forexample, and stored in file servers and object repositories, examples ofwhich include the cloud-based Dell ECS (Elastic Cloud Storage) andAmazon S3. Example embodiments may capture, and render in graphicalform, such properties and relationships between/among data.

It is noted that as used herein, the term ‘data’ is intended to be broadin scope. Thus, that term embraces, by way of example and notlimitation, data segments such as may be produced by data streamsegmentation processes, data chunks, data blocks, atomic data, emails,objects of any type, files of any type including media files, wordprocessing files, spreadsheet files, and database files, as well ascontacts, directories, sub-directories, volumes, and any group of one ormore of the foregoing.

Example embodiments of the invention are applicable to any systemcapable of storing and handling various types of objects, in analog,digital, or other form. Although terms such as document, file, segment,block, or object may be used by way of example, the principles of thedisclosure are not limited to any form of representing and storing dataor other information. Rather, such principles are equally applicable toany object capable of representing information.

With particular attention now to FIG. 1 , a configuration and processflow are disclosed in which a system of record 100 receives an IO(Input/Output operation) 102, such as a write request for a new object.As shown, the system of record 100 is broad in scope and may embrace,but is not limited to, an application, file server, object store, and/orother entity. In response to receipt of the IO 102, the system of record100 may automatically generate, and emit, an event 104 that correspondsto the IO 102. The events 104 may be generated and emitted in real timeas IOs 102 are received or may be generated and retained in a buffer fora period of time before being emitted. In the latter case, the events104 may be stored in different buffers according to the characteristicsof the respective data that were the catalyst for generation of thevarious events 104. The events 104 may be emitted by the system ofrecord 100 over any of a variety of mediums including, but not limitedto, webhook invocations, message queues, remote procedure calls, RESTfulAPIs, or otherwise. The event 104 may include, and/or refer to, one ormore objects or other data that were affected by the IO 102.

Further, the IO 102 may comprise any operation(s) that operates on data.Examples of IOs thus include, but are not limited to, reads, writes,deletes, replications, data transfers, object writes, file writes, andchanges to a state of the data. In addition to creating the event 104,the system of record 100 may also generate metadata about the event 104,and metadata about the data change, that is, the IO 102, that triggeredgeneration of the event 104. The metadata about the event, and/or themetadata about the data, may comprise, for example, one or moreproperties of the data, and/or properties of the entity that generatedthe IO 102.

Turning next to FIG. 2 , a system and/or process, which may or may notbe implemented as an element of the system of record 100, may beprovided that comprises a graph proxy 200. In general, the graph proxy200 may listen for events 104 and take action concerning the incomingevents 104. The event 104 may include, and/or may reference, theobject(s) or other elements that were the subject of the IO(s) 102.

As well, the graph proxy 200 may also interact, such as by way of agraph interface 201, with a graph database 202 for the purpose ofcreation, modification, and deletion, of one or more database records.That is, the graph database 202 may include various records that may bemodified or deleted and, further, records may be added to the graphdatabase 202. Such modification, creation, or deletion may be performedby, and/or, at the direction of, the graph proxy 200. Example recordsthat may be stored in the graph database 202 include, but are notlimited to, vertices, edges, and properties.

The graph database 202 may be automatically queried by a graph generator203 that accesses graph database 202 records to generate graphs based onthose records. The graph generator 203 may, or may not, be combined withthe graph database 202 in a single computing entity that is able tocreate, modify, or delete records, and also to generate graphs based onthe records. In some embodiments, the graph generator 203 may query thegraph database 202 on a recursive basis to help ensure that one or moregraphs generated based on those records are kept up to date. Graphsgenerated by the graph generator 203 may be viewed by a user with the UI212 and/or the graphs may be transmitted by the graph generator 203 toone or more entities.

As disclosed in FIG. 2 , the graph proxy 200 may comprise variousmodules. Such modules may include, but are not limited to, a rulesmodule 204, schema detection module 206, vertex creation module 208, andan edge creation module 210. These modules may, but need not, all residein the graph proxy 200. In other embodiments, any one or more of themodules may operate as a standalone entity that may be configured tocommunicate with the other modules, and with the graph proxy 200.

In some embodiments, a user may predefine, such as by way of a UI 212(User Interface) that may be, for example, a GUI (Graphical UserInterface) or a CLI (Command Line Interface), a set of schema rules,using the rules module 204, in the graph proxy 200 that may be used bythe schema detection module 206 to identify the schema of incomingobjects, or other data, and objects, or other data, identified by anincoming event 104. In some embodiments, the schema rules may bedefined, and/or modified, by a trained ML algorithm without human input.A schema of data, for example that has been added or modified, forexample, may be identified by the schema detection module 206 in variousways. Note that, in general, any user interaction with the graph proxy200 may take place by way of the UI 212.

For example, the schema detection module 206 may identify a schema of anobject by implication, that is, a received event 104 that has specificmetadata criteria matches would be associated with a given schema. Forinstance, all objects received by the graph proxy 200 that were written,according to an IO such as the IO 102, to bucket ‘person’ should conformto the schema that a user or other entity has defined for ‘person.’ Inaddition, or as an alternative, to identification by implication, thescheme detection module 206 may identify a schema for an object byassociation, that is, objects received by, or at least identified to,the graph proxy 200, may have their respective schema derived, by theschema detection module 206, from the object content. The derived schemamay then be compared, by the schema detection module 206, against a setof schemas, previously uploaded by the user or other entity, to identifya match between the detected schema and the prior schemas.

Schema definitions may be proprietary, or may leverage a standardmechanism, such as JSON (JavaScript Object Notation)/JSON schema, XML(Extensible Markup Language)/XSD (XML Schema Definition), unifiedmodeling language, resource description framework, or other. Exampleproprietary schemas are discussed below. Finally, it should be notedthat example embodiments may use implication and/or derivation todetermine the schema of an object.

With continued attention to FIG. 2 , a user or other entity, such as atrained ML algorithm for example may, such as by way of the UI 212,define, or cause the definition of, one or more vertices and/or one ormore edges. The vertices and edges may be defined, respectively, usingthe vertex creation module 208 and the edge creation module 210. In oneexample embodiment, a user may predefine, using the vertex creationmodule 208, a set of vertex rules in the graph proxy 200 to specify howand when a vertex should be created in a graph to represent the datacontained within the object or event identified, and/or received, by wayof the event 104 and/or one or more other events 104.

As well in example embodiments a user or other entity, such as a trainedML algorithm, may be able to specify which properties from the source,that is the source of the object(s) and other data/metadata included in,or referenced by, one or more emitted events 104, should be carried overas properties in the created vertex. Further, the user may be able tospecify one or more conditions upon which the vertex should, or shouldnot, be created.

A user or other entity, such as a trained ML algorithm, may use the edgecreation module 210 to define a set of edge creation rules in the graphproxy 200 to specify how and when one or more edges should, or shouldnot, be created based on properties and/or other metadata within thesource data, that is, properties and metadata in the data received in,or identified by, one or more events 104 received at the graph proxy200. Since edges are relationships created between vertices, it may bethe case in some circumstances that certain vertices may need to becreated in a graph a priori, that is, without reliance on, or referenceto, any prior observations or experience. For instance, creation of avertex for source data ‘Joel’ may dictate that a relationship, that is,an edge, also be connected to a pre-created vertex called ‘Person,’ andin this example, ‘Person’ would be created a priori in the graph, and bereferenced within the rule.

C. Further Discussion

As will be apparent from this disclosure, example embodiments mayimplement a variety of features and functionalities. For example, afile, object, or other data, may be automatically represented in a graphdatabase, along with any relationships of that file, object, or otherdata, to other files, objects, and data. As another example, theconditions under which a file, object, or other data, is automaticallyrepresented in a graph database can be programmatically defined andimplemented by a user and/or by an ML algorithm. Further, the conditionsby which relationships amongst vertices represented in a graph databasemay be programmatically defined by a user, and such relationships may beautomatically built by an entity such as an ML algorithm on behalf ofthe user. Another example feature of some embodiments is the dramaticdecrease, relative to conventional approaches, in the time between thecreation of new data on a system of record and when the data becomesuseful for cases such as analytics, data science, and machine learning.By enabling this event-based system, example embodiments may allow for amore near real-time system which enables multiple use cases that are notpossible today. As a final example, embodiments may employ layering andutilization of multiple ontologies and knowledge graphs to create afully linkable set of graphs. This approach may enable interoperatingsubgraphs and graph systems that create optimal sharing of data andknowledge generation.

D. Aspects of Some Illustrative Examples

With reference now to FIGS. 3-7 , details are provided concerning someillustrative examples of embodiments of the invention. In FIG. 3 , anobject 300, in the form of a breakfast menu item, is shown. In thisillustrative example, object data for breakfast menu items may beautomatically loaded into a graph, and relationships may be createdunder the following circumstances: 1) the meal contains eggs, and 2) themeal is considered high calorie or low calorie.

As shown, the breakfast menu item object 300 may have various attributesor properties such as, for example, a ‘menu item ID’ (133), a ‘name’(Big League Breakfast), a ‘number of calories’ (1500), a ‘price’($13.95), and an ‘includes’ property that lists various item IDs anditems included in the Big League Breakfast. That is, the ‘includes’property indicates that the Big League Breakfast includes ‘eggs’('ItemID' 134) with ‘Quantity’ 3, ‘Pancakes’ (ItemID 135) with‘Quantity’ 3, and ‘Bacon’ (ItemID 136) with ‘Quantity’ 2.

A pre-defined set of schema rules may be created by a user, to whichincoming objects, such as the breakfast menu item object 3 would conformas described above, that is, through implication and/or association. Asnoted in FIG. 4 , the schema 302 that has been defined for the BigLeague Breakfast is assigned a ‘SchemaID’ of 5, and also referencesanother schema 304, with ‘SchemaID’ 6, that specifies the items includedin the Big League Breakfast. That is, the schema 304 that has beendesignated ‘SchemaID’ 6 lists the items that must be included in the‘includes’ property of an object for that object to be determined asconforming with schema 304. Note that an object may conform with schema304, but not conform with schema 302, and vice versa.

Thus, an object that includes all the properties, and only theproperties, indicated in the schemas 302 and 304 may be considered ashaving mapped to schema 302. On the other hand, if an object includesall, and only, the fields in ‘Schema5’ but the ‘includes’ field of thatobject lists only 2 items instead of 3, as required by ‘SchemaID’ 5,then the object considered to map neither to ‘Schema ID’ 5 nor to‘SchemaID’ 6. That is, all the properties included in, and referencedby, a schema, must be included in an object for that object to be deemedas mapping to that schema. In examples such as this, the schema 302 with‘SchemaID’ 5 may be referred to as a ‘parent’ schema, and the schema 304with ‘SchemaID’ 6 may be referred to as a ‘child’ schema. A child schemamay be incorporated into a parent schema by the reference of the parentschema to that child schema.

As indicated in FIG. 4 , the combination of properties of the object300, which may be received as part of an IO, may be compared to theapplicable schema(s) to determine if that object 300 maps to suchschema(s) or not. If the object 300 does not map to a particular schema,a new schema may be created, possibly automatically, to which the objectmaps. Alternatively, the unmapped object may not be further processedand, as such, may not be reflected in graph that identifies objects thatdo map to that schema. For example, if an object concerns a car tire,then the object will not map to a schema that defines a breakfast menuitem, and thus that object may not be processed further.

Turning next to FIG. 5 , details are provided concerning creation ofvertices after a determination has been made that an object maps to oneor more schemas. Various rules, such as the example rule 400, may bedefined for the creation of vertices. The rules for creating verticesmay be preconfigured by a user, but that is not necessarily required. Inthis example, if an object maps to the schema 302 “SchemaId” 5, and byimplication, maps to schema 304 as well, a vertex may be created. It isnoted that after a successful mapping of an object to one or moreschemas, A) several conditions may exist, and B) the action taken mayhave more actions beyond simply creating the vertex, such as specifyingwhich properties to carry over into the vertex and potentially externalinformation that could be gathered to include in the properties of thevertex.

With reference to the example rule 400, if an object maps to the schema302, that is, the schema whose ‘SchemaID’ is 5, then a vertex may becreated. On the other hand, if an object does not map to the schema 302,then the rule 400 provides that a vertex will not be created.

With the schema and vertex rules in place, edge rules may then becreated. For this example, we want two edges to be conditionallycreated. With reference to the examples of FIG. 6 , and if the object300 maps to schema 302, particularly the edge creation rule 500, (if thebreakfast menu item contains eggs), a relationship, denoted as an edgein a graph, should, according to the TargetID' parameter of rule 500, becreated between the newly created vertex and the pre-existing ‘Eggs’vertex.

Similarly, and referring next to the edge creation rule 600, if thebreakfast menu item has over 1,000 calories, an edge should, accordingto the ‘TargetID’ parameter (value 52) of the rule 600, be createdbetween the newly created vertex and the pre-existing ‘High Calorie’vertex. On the other hand, if the breakfast menu item has 1000 caloriesor less, an edge should, according to the ‘TargetID’ parameter (value54) be created between the newly created vertex and the pre-existing‘Low Calorie’ vertex.

To briefly summarize, and with reference now to the example graphs/graphdatabases 700 of FIG. 7 , when the event is received for an object thatcorresponds to the newly-created ‘Big League Breakfast’ menu item, aschema of that object may be discerned, through implication and/orassociation, a vertex may be created in the graph for the menu item, andone or more edges may be created to ‘Eggs’ and ‘High Calorie’ as shownin the example graphs/graph databases 700A and 700B of FIG. 7 .

More particularly, and as noted earlier, ‘Eggs’ may be a pre-definedvertex that was included in a graph and/or graph database prior toreceipt of a menu item object, such as the menu item object 300.Subsequently, and as discussed above, a menu item object named ‘BigLeague Breakfast’ may be implemented that maps to a particular schema ofinterest and, by virtue of such mapping, is the basis for addition ofthe vertex 704 to the graphs/graph databases 700. Because analysis ofthe ‘Big League Breakfast’ object indicated that one of the itemsincluded in that object is ‘Eggs,’ an edge 706 is created in thegraphs/graph databases 700 that indicates a relationship between the‘Big League Breakfast’ vertex 702 and the ‘Eggs’ vertex 704. In theexample of FIG. 7 , the edge 706 is denoted ‘Contains’ since the BigLeague Breakfast includes eggs. The edge 706 is one-way however, sincewhile the Big League Breakfast includes eggs, it does not follow thateggs include a Big League Breakfast.

Note that because the Big League Breakfast is defined as includingcertain items, such as eggs, each of those items may have a pre-definedvertex included in the graph so that, upon creation of the Big LeagueBreakfast object, that object is able to refer, in the graph, to itemsincluded in that menu item. Other pre-defined vertices could be definedand implemented that may, or may not, constitute elements of a BigLeague Breakfast. For example, bacon may be the subject of a pre-definedvertex, and the same may be true for sausage, but since sausage is notin the Big League Breakfast, the vertex 704 would not be connected tosausage. However, a different breakfast might refer to the pre-definedsausage vertex. Thus, a grouping of pre-defined vertices may be employedwhich may be respectively associated with one or more breakfast menuitems.

With continued reference to the example of FIG. 7 , a graph/graphdatabase 700B may include a ‘high calorie’ vertex 708 which, like the‘eggs’ vertex 702, may be a pre-defined and implemented vertex. Becauseit is known from the object 300 that the calorie count for the BigLeague Breakfast is 1500, the vertex 704 may be connected to the vertex708 by an edge 710 according to the edge rule 600 which specifies thatany calorie count greater than or equal to 1000 is considered ‘highcalorie.’

If there are external food ontologies, this data may additionally besent to another graphing system aware of those ontologies to generatevertices, and relationships based on those ontologies. This may allowthe graph to be built or generated with terminology and relationshipsthat are impactful to the end users in accordance with predefinedschemas, consistent naming, and consistent structures. Patterns such asthis may increase the ability to generate knowledge and increaseintelligence. Having this knowledge to model against for AI issimplifying models and feature vectors.

As the example of FIG. 7 illustrates, objects associated with incomingevents, which may be events in a continuous incoming event stream, maybe quickly processed and included in one or more graphs through the useof schemas, and the application of vertex rules and edge rules. Graphsmay be both created, and modified, using such schemas and rules. Theexample of FIG. 7 also illustrates that relationships between data, suchas objects for example, may be quickly rendered in a graphical form thatis visually perceptible by a human and/or by devices such as videocameras and scanners. At least because they may be quickly generated andupdated, graphs according to example embodiments may be used to improvethe speed and efficiency of computing processes that rely on thegraphical representation of objects, relationships, object properties,and relationship properties.

E. Example Methods

It is noted with respect to the example method of FIG. 8 , that any ofthe disclosed processes, operations, methods, and/or any portion of anyof these, may be performed in response to, as a result of, and/or, basedupon, the performance of any preceding process(es), methods, and/or,operations. Correspondingly, performance of H one or more processes, forexample, may be a predicate or trigger to subsequent performance of oneor more additional processes, operations, and/or methods. Thus, forexample, the various processes that may make up a method may be linkedtogether or otherwise associated with each other by way of relationssuch as the examples just noted. Finally, and while it is not required,the individual processes that make up the various example methodsdisclosed herein are, in some embodiments, performed in the specificsequence recited in those examples. In other embodiments, the individualprocesses that make up a disclosed method may be performed in a sequenceother than the specific sequence recited.

Directing attention now to FIG. 8 , an example method 800 is disclosed.Initially, a system, such as a system of record or other entity, maylisten for, and receive 802, an IO stream from an application, or groupof applications, or any other data generator, that is, any other systemor device operable to generate new and/or modified data. The IO may, forexample, add an object, delete an object, move an object, or modify anobject.

The system may parse or otherwise evaluate the IOs to determine thenature of the operation to be performed, and the data that will beaffected by performance of that operation. Based on this parsing orother analysis, the system may generate and transmit an event 804. Theevent may include information about the operation specified in the IOand/or the data affected by that operation.

The transmitted event may be received 806 by a graph proxy. The graphproxy may be hosted at a server accessible by a group of clients,although that arrangement is not necessarily required. In someembodiments, the graph proxy may be hosted at a cloud storage or cloudcomputing site to listen for IOs incoming from one or more clients tothe cloud storage site or cloud computing site. More generally, noparticular configuration or hosting of the graph proxy is required.

After an event has been received 806 by the graph proxy, the graph proxymay apply one or more schema rules to determine whether an objectassociated with the event that was received 806 matches one or moreschemas. If no match is found between the object and a schema, themethod 800 may terminate. On the other hand, if the object is determinedto match to a schema, the object may be mapped 808 to the schema.

After the object has been mapped 808 to a schema, one or more rulesapplicable to the object may be identified 810. A rule may specify, forexample, an action that is to be performed with respect to a graph, orgraph element, when one or more conditions, which may also be specifiedin the rule, are met.

Finally, when a determination has been made that a condition specifiedin a rule has been met, the action specified by that rule may beperformed 812. Note that multiple rules, conditions, and actions, may beinvolved in creating or modifying a graph or creating, modifying, ordeleting, a graph element. For example, at 810, multiple rules may beidentified, and each of those rules may specify one or more conditionsand actions.

The method 800 may be performed on an ongoing basis. Alternatively, themethod 800 may be performed until an identified stream of IOs has beenprocessed. Further, the system may listen for IOs on an ongoing basis,or during a user-specified times, or for IOs from a particularapplication, or from a particular client

E. Further Example Embodiments

Following are some further example embodiments of the invention. Theseare presented only by way of example and are not intended to limit thescope of the invention in any way.

Embodiment 1. A method, comprising: receiving, by a graph proxy, anevent from an event generator, wherein the event includes informationabout an IO and information about data affected by the IO; comparing, bythe graph proxy, the data to a schema; when the data is determined bythe graph proxy to map to the schema, identifying, by the graph proxy, arule that is associated with the event, and the rule specifiesperformance of an action when a condition is met; and when the conditionis met, performing, by the graph proxy, the action, and the action isperformed with respect to a graph.

Embodiment 2. The method as recited in embodiment 1, wherein the actioncomprises one of creating, updating, or deleting, an element of thegraph.

Embodiment 3. The method as recited in any of embodiments 1-2, whereinthe action is performed with respect to an element of the graph, and theelement of the graph comprises one of a vertex, an edge, a property of avertex, or a property of an edge.

Embodiment 4. The method as recited in any of embodiments 1-3, whereinthe schema is a parent schema that references a child schema, and thedata matches the parent schema and the child schema.

Embodiment 5. The method as recited in embodiment 3, wherein the elementis a property, and the action performed by the graph proxy comprisesmodifying the property.

Embodiment 6. The method as recited in any of embodiments 1-5, whereinthe action comprises creating an edge of the graph, and the methodfurther comprises specifying a property for inclusion in the edge.

Embodiment 7. The method as recited in any of embodiments 1-6, whereinthe action is performed automatically when the condition is met.

Embodiment 8. The method as recited in any of embodiments 1-7, whereinthe schema is identified by implication.

Embodiment 9. The method as recited in any of embodiments 1-8, whereinthe schema is identified by an association process in which a schema isderived from the data, and the schema to which the data is compared isthe derived schema.

Embodiment 10. The method as recited in any of embodiments 1-9, furthercomprising creating, a priori based on another rule, an edge in thegraph.

Embodiment 11. A method for performing any of the operations, methods,or processes, or any portion of any of these, disclosed herein.

Embodiment 12. A non-transitory storage medium having stored thereininstructions that are executable by one or more hardware processors toperform operations comprising the operations of any one or more ofembodiments 1-11.

F. Example Computing Devices and Associated Media

The embodiments disclosed herein may include the use of a specialpurpose or general-purpose computer including various computer+hardwareor software modules, as discussed in greater detail below. A computermay include a processor and computer storage media carrying instructionsthat, when executed by the processor and/or caused to be executed by theprocessor, perform any one or more of the methods disclosed herein, orany part(s) of any method disclosed.

As indicated above, embodiments within the scope of the presentinvention also include computer storage media, which are physical mediafor carrying or having computer-executable instructions or datastructures stored thereon. Such computer storage media may be anyavailable physical media that may be accessed by a general purpose orspecial purpose computer.

By way of example, and not limitation, such computer storage media maycomprise hardware storage such as solid state disk/device (SSD), RAM,ROM, EEPROM, CD-ROM, flash memory, phase-change memory (“PCM”), or otheroptical disk storage, magnetic disk storage or other magnetic storagedevices, or any other hardware storage devices which may be used tostore program code in the form of computer-executable instructions ordata structures, which may be accessed and executed by a general-purposeor special-purpose computer system to implement the disclosedfunctionality of the invention. Combinations of the above should also beincluded within the scope of computer storage media. Such media are alsoexamples of non-transitory storage media, and non-transitory storagemedia also embraces cloud-based storage systems and structures, althoughthe scope of the invention is not limited to these examples ofnon-transitory storage media.

Computer-executable instructions comprise, for example, instructions anddata which, when executed, cause a general-purpose computer, specialpurpose computer, or special purpose processing device to perform acertain function or group of functions. As such, some embodiments of theinvention may be downloadable to one or more systems or devices, forexample, from a website, mesh topology, or other source. As well, thescope of the invention embraces any hardware system or device thatcomprises an instance of an application that comprises the disclosedexecutable instructions.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts disclosed herein are disclosed asexample forms of implementing the claims.

As used herein, the term ‘module’ or ‘component’ may refer to softwareobjects or routines that execute on the computing system. The differentcomponents, modules, engines, and services described herein may beimplemented as objects or processes that execute on the computingsystem, for example, as separate threads. While the system and methodsdescribed herein may be implemented in software, implementations inhardware or a combination of software and hardware are also possible andcontemplated. In the present disclosure, a ‘computing entity’ may be anycomputing system as previously defined herein, or any module orcombination of modules running on a computing system.

In at least some instances, a hardware processor is provided that isoperable to carry out executable instructions for performing a method orprocess, such as the methods and processes disclosed herein. Thehardware processor may or may not comprise an element of other hardware,such as the computing devices and systems disclosed herein.

In terms of computing environments, embodiments of the invention may beperformed in client-server environments, whether network or localenvironments, or in any other suitable environment. Suitable operatingenvironments for at least some embodiments of the invention includecloud computing environments where one or more of a client, server, orother machine may reside and operate in a cloud environment.

With reference briefly now to FIG. 9 , any one or more of the entitiesdisclosed, or implied, by FIGS. 1-8 and/or elsewhere herein, may takethe form of, or include, or be implemented on, or hosted by, a physicalcomputing device, one example of which is denoted at 900. As well, whereany of the aforementioned elements comprise or consist of a virtualmachine (VM), that VM may constitute a virtualization of any combinationof the physical components disclosed in FIG. 9 .

In the example of FIG. 9 , the physical computing device 900 includes amemory 902 which may include one, some, or all, of random access memory(RAM), non-volatile memory (NVM) 904 such as NVRAM for example,read-only memory (ROM), and persistent memory, one or more hardwareprocessors 906, non-transitory storage media 908, UI device 910, anddata storage 912. One or more of the memory components 902 of thephysical computing device 900 may take the form of solid state device(SSD) storage. As well, one or more applications 914 may be providedthat comprise instructions executable by one or more hardware processors906 to perform any of the operations, or portions thereof, disclosedherein.

Such executable instructions may take various forms including, forexample, instructions executable to perform any method or portionthereof disclosed herein, and/or executable by/at any of a storage site,whether on-premises at an enterprise, or a cloud computing site, client,datacenter, data protection site, including but not limited to a cloudstorage site or backup server, to perform any of the functions disclosedherein. As well, such instructions may be executable to perform any ofthe other operations and methods, and any portions thereof, disclosedherein.

The present invention may be embodied in other specific forms withoutdeparting from its spirit or essential characteristics. The describedembodiments are to be considered in all respects only as illustrativeand not restrictive. The scope of the invention is, therefore, indicatedby the appended claims rather than by the foregoing description. Allchanges which come within the meaning and range of equivalency of theclaims are to be embraced within their scope.

What is claimed is:
 1. A method, comprising: receiving, by a graphproxy, an event from an event generator, wherein the event includesinformation about an IO and information about data affected by the IO;comparing, by the graph proxy, the data to a schema; when the data isdetermined by the graph proxy to map to the schema, identifying, by thegraph proxy, a rule that is associated with the event, and the rulespecifies performance of an action when a condition is met; and when thecondition is met, performing, by the graph proxy, the action, and theaction is performed with respect to a graph.
 2. The method as recited inclaim 1, wherein the action comprises one of creating, updating, ordeleting, an element of the graph.
 3. The method as recited in claim 1,wherein the action is performed with respect to an element of the graph,and the element of the graph comprises one of a vertex, an edge, aproperty of a vertex, or a property of an edge.
 4. The method as recitedin claim 3, wherein the element is a property, and the action performedby the graph proxy comprises modifying the property.
 5. The method asrecited in claim 1, wherein the schema is a parent schema thatreferences a child schema, and the data matches the parent schema andthe child schema.
 6. The method as recited in claim 1, wherein theaction comprises creating an edge of the graph, and the method furthercomprises specifying a property for inclusion in the edge.
 7. The methodas recited in claim 1, wherein the action is performed automaticallywhen the condition is met.
 8. The method as recited in claim 1, whereinthe schema is identified by implication.
 9. The method as recited inclaim 1, wherein the schema is identified by an association process inwhich a schema is derived from the data, and the schema to which thedata is compared is the derived schema.
 10. The method as recited inclaim 1, further comprising creating, a priori based on another rule, anedge in the graph.
 11. A non-transitory storage medium having storedtherein instructions that are executable by one or more hardwareprocessors to perform operations comprising: receiving, by a graphproxy, an event from an event generator, wherein the event includesinformation about an IO and information about data affected by the IO;comparing, by the graph proxy, the data to a schema; when the data isdetermined by the graph proxy to map to the schema, identifying, by thegraph proxy, a rule that is associated with the event, and the rulespecifies performance of an action when a condition is met; and when thecondition is met, performing, by the graph proxy, the action, and theaction is performed with respect to a graph.
 12. The non-transitorystorage medium as recited in claim 11, wherein the action comprises oneof creating, updating, or deleting, an element of the graph.
 13. Thenon-transitory storage medium as recited in claim 11, wherein the actionis performed with respect to an element of the graph, and the element ofthe graph comprises one of a vertex, an edge, a property of a vertex, ora property of an edge.
 14. The non-transitory storage medium as recitedin claim 13, wherein the element is a property, and the action performedby the graph proxy comprises modifying the property.
 15. Thenon-transitory storage medium as recited in claim 11, wherein the schemais a parent schema that references a child schema, and the data matchesthe parent schema and the child schema.
 16. The non-transitory storagemedium as recited in claim 11, wherein the action comprises creating anedge of the graph, and the operations further comprise specifying aproperty for inclusion in the edge.
 17. The non-transitory storagemedium as recited in claim 11, wherein the action is performedautomatically when the condition is met.
 18. The non-transitory storagemedium as recited in claim 11, wherein the schema is identified byimplication.
 19. The non-transitory storage medium as recited in claim11, wherein the schema is identified by an association process in whicha schema is derived from the data, and the schema to which the data iscompared is the derived schema.
 20. The non-transitory storage medium asrecited in claim 11, further comprising creating, a priori based onanother rule, an edge in the graph.